occlusion detection
OccluNet: Spatio-Temporal Deep Learning for Occlusion Detection on DSA
Kore, Anushka A., Nijenhuis, Frank G. te, van der Sluijs, Matthijs, van Zwam, Wim, Majoie, Charles, Nijeholt, Geert Lycklama à, Ruijters, Danny, Vos, Frans, Cornelissen, Sandra, Su, Ruisheng, van Walsum, Theo
Accurate detection of vascular occlusions during endovas-cular thrombectomy (EVT) is critical in acute ischemic stroke (AIS). Interpretation of digital subtraction angiography (DSA) sequences poses challenges due to anatomical complexity and time constraints. This work proposes OccluNet, a spatio-temporal deep learning model that integrates YOLOX, a single-stage object detector, with transformer-based temporal attention mechanisms to automate occlusion detection in DSA sequences. We compared OccluNet with a YOLOv11 baseline trained on either individual DSA frames or minimum intensity projections. Two spatio-temporal variants were explored for OccluNet: pure temporal attention and divided space-time attention. Evaluation on DSA images from the MR CLEAN Registry revealed the model's capability to capture temporally consistent features, achieving precision and recall of 89.02% and 74.87%, respectively. OccluNet significantly outperformed the baseline models, and both attention variants attained similar performance. Source code is available here.
A-MFST: Adaptive Multi-Flow Sparse Tracker for Real-Time Tissue Tracking Under Occlusion
Chen, Yuxin, Wu, Zijian, Schmidt, Adam, Salcudean, Septimiu E.
Purpose: Tissue tracking is critical for downstream tasks in robot-assisted surgery. The Sparse Efficient Neural Depth and Deformation (SENDD) model has previously demonstrated accurate and real-time sparse point tracking, but struggled with occlusion handling. This work extends SENDD to enhance occlusion detection and tracking consistency while maintaining real-time performance. Methods: We use the Segment Anything Model2 (SAM2) to detect and mask occlusions by surgical tools, and we develop and integrate into SENDD an Adaptive Multi-Flow Sparse Tracker (A-MFST) with forward-backward consistency metrics, to enhance occlusion and uncertainty estimation. A-MFST is an unsupervised variant of the Multi-Flow Dense Tracker (MFT). Results: We evaluate our approach on the STIR dataset and demonstrate a significant improvement in tracking accuracy under occlusion, reducing average tracking errors by 12 percent in Mean Endpoint Error (MEE) and showing a 6 percent improvement in the averaged accuracy over thresholds of 4, 8, 16, 32, and 64 pixels. The incorporation of forward-backward consistency further improves the selection of optimal tracking paths, reducing drift and enhancing robustness. Notably, these improvements were achieved without compromising the model's real-time capabilities. Conclusions: Using A-MFST and SAM2, we enhance SENDD's ability to track tissue in real time under instrument and tissue occlusions.
Left/Right Hand Segmentation in Egocentric Videos
Betancourt, Alejandro, Morerio, Pietro, Barakova, Emilia, Marcenaro, Lucio, Rauterberg, Matthias, Regazzoni, Carlo
Wearable cameras allow people to record their daily activities from a user-centered (First Person Vision) perspective. Due to their favorable location, wearable cameras frequently capture the hands of the user, and may thus represent a promising usermachine interaction tool for different applications. Existent First Person Vision methods handle hand segmentation as a backgroundforeground problem, ignoring two important facts: i) hands are not a single "skin-like" moving element, but a pair of interacting cooperative entities, ii) close hand interactions may lead to hand-to-hand occlusions and, as a consequence, create a single hand-like segment. These facts complicate a proper understanding of hand movements and interactions. Our approach extends traditional background-foreground strategies, by including a hand-identification step (left-right) based on a Maxwell distribution of angle and position. Hand-to-hand occlusions are addressed by exploiting temporal superpixels. The experimental results show that, in addition to a reliable left/right hand-segmentation, our approach considerably improves the traditional background-foreground hand-segmentation. Keywords: Hand-Segmentation, Hand-identification, Egocentric Vision, First Person Vision 1. Introduction The recent widespread availability of wearable devices has quickly attracted the interest of researchers, computer scientists and high-tech companies [1]. The 90's idea of a body-worn device that is always ready to be used is nowadays possible, and its potential applicability to real problems is evident. In general, the wearable sensor that most attracted researchers' attention is the video camera: while enjoying a unique position to record what the user is seeing, it suffers from important issues and technical challenges [2]. Images and videos recorded from this perspective are commonly referred to as First-Person Vision (FPV) or Egocentric videos [2].
Occlusion Detection and Motion Estimation with Convex Optimization
Ayvaci, Alper, Raptis, Michalis, Soatto, Stefano
We tackle the problem of simultaneously detecting occlusions and estimating optical flow. We show that, under standard assumptions of Lambertian reflection and static illumination, the task can be posed as a convex minimization problem. Therefore, the solution, computed using efficient algorithms, is guaranteed to be globally optimal, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance.